TOPO: Improving remote homologue recognition via identifying common protein structure framework

نویسندگان

  • Jianwei Zhu
  • Haicang Zhang
  • Chao Wang
  • Bin Ling
  • Wei-Mou Zheng
  • Dongbo Bu
چکیده

Extended Abstract) 1 Motivation Protein structure prediction plays an important role in the fields of bioinformatics and biology. Traditional protein structure prediction approaches include template-based modeling (TBM, including homology modeling, and threading), and free modeling (FM). In particular, a threading algorithm takes a query protein sequence as input, recognizes the most likely fold, and finally reports the alignments of the query sequence to structure-known templates as output. The existing threading approaches mainly utilizes the information of protein sequence profile, solvent accessibility, contact probability, etc. The threading strategy has been shown to be successful in structure prediction of a great amount of proteins; however, the existing threading approaches show poorly performance for remote homology proteins. How to improve the fold recognition for remote homology proteins remains a challenge to protein structure prediction. Correspondence should be addressed to Wei-Mou Zheng ([email protected]) and Dongbo Bu ([email protected]) The sequences of proteins in remote homology generally show relatively weak signal of structure. However, this does not mean that there is no sequence conservation hints for structure. The success of multiple-templates strategy implies the existence of common frameworks, i.e. some regions of proteins are conservative both in the structure and sequence. Such common frameworks should be responsible to the structural stability and then conservative in the evolution. Based on this we proposed a novel threading approach in three steps. First, for each template, the common structural frameworks shared by its homologous proteins were calculated. Second, unlike in traditional threading methods where the alignment is made against the whole template, we aligned the query protein sequence against a common framework first. This strategy avoids the drawback of the traditional threading approach, i.e. the alignment of variable regions beyond conserved motifs is prone to bringing in error. Third, the final alignments were generated via aligning query sequence against candidate full-length templates in the family. Briefly speaking, we run TreeThreader[2] to build alignments of query against the new template database, and ranked alignments by E-value for model generation. Finally, we generated models by MODELLER based on candidate alignments. The generated models are ranked according to dDFIRE[3] energy function. 2 Methods For each template with known structure, all of its remote homology proteins are first identified based on structure alignment. Then, a linear programming was designed to identify the common framework shared by these remote homology proteins. The common framework identification problem can be described as: given a collection of homologous proteins H = {s1, . . . , sN} with length L1, . . . , LN , the objective is to find m segments with length n with high sequence conservation and structural similarity. As an example, Fig. 1 shows the common frameworks shared by protein 3gxr A and its homologous proteins. 2.1 Basic idea of the linear program The common framework poses double-fold requirements, i.e., significantly high sequence conservation and structural similarity. In the linear program, the objective function was designed to describe structural similarity, and the constraints were designed to describe sequence similarities. Specifically, the linear program utilizes a set of boolean variables to represent the location of conserved segments, i.e., xij = 1 denotes that in the ith protein, Figure 1: Common frameworks shared by protein 3gxr A and its homologous proteins. The common framework consists of three dispersed segments (in yellow, cyan, and green). At the conserved segments, the homologous proteins display significant sequence conservation and structural similarity. the kth segment is located at the j-th residue. Then, the structural similarity objective and sequence similarity constraints can be described using xij . The constraints were designed to represent the following requirements. • For any sequence, the kth segment in common framework is unique; • No segment in a common framework overlaps nor crosses. • The segments should have significantly high sequence similarity. The integer linear programming model can be described as: max structural similarity

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FALCON@home: a high-throughput protein structure prediction server based on remote homologue recognition

SUMMARY The protein structure prediction approaches can be categorized into template-based modeling (including homology modeling and threading) and free modeling. However, the existing threading tools perform poorly on remote homologous proteins. Thus, improving fold recognition for remote homologous proteins remains a challenge. Besides, the proteome-wide structure prediction poses another cha...

متن کامل

Improving LNMF Performance of Facial Expression Recognition via Significant Parts Extraction using Shapley Value

Nonnegative Matrix Factorization (NMF) algorithms have been utilized in a wide range of real applications. NMF is done by several researchers to its part based representation property especially in the facial expression recognition problem. It decomposes a face image into its essential parts (e.g. nose, lips, etc.) but in all previous attempts, it is neglected that all features achieved by NMF ...

متن کامل

RHL1 is an essential component of the plant DNA topoisomerase VI complex and is required for ploidy-dependent cell growth.

How cells achieve their final sizes is a pervasive biological question. One strategy to increase cell size is for the cell to amplify its chromosomal DNA content through endoreduplication cycles. Although endoreduplication is widespread in eukaryotes, we know very little about its molecular mechanisms. Successful progression of the endoreduplication cycle in Arabidopsis requires a plant homolog...

متن کامل

Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation.

An analysis was performed on 335 pairs of structurally aligned proteins derived from the structural classification of proteins (SCOP http://scop.mrc-lmb.cam.ac.uk/scop/) database. These similarities were divided into analogues, defined as proteins with similar three-dimensional structures (same SCOP fold classification) but generally with different functions and little evidence of a common ance...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015